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Dockser, L Kissel, MK Seager, JS Vetter, K Yates 

November 2002 Proceedings of the 2002 ACM/IEEE conference on Supercomputing 



Full text available: 



Additional Information: full citation , abstract, references , citings , index 
terms 



This paper gives an overview of the BlueGene/L Supercomputer. This is a jointly funded 
research partnership between IBM and the Lawrence Livermore National Laboratory as part 
of the United States Department of Energy ASCI Advanced Architecture Research Program. 
Application performance and scaling studies have recently been initiated with partners at a 
number of academic and government institutions, including the San Diego Supercomputer 
Center and the California Institute of Technology. This mass ... 



2 Tramp: An interpretive associative processor with deductive capabilities 
William L. Ash, Edgar H. Sibley 

January 1968 Proceedings of the 1968 23rd ACM national conference 

Full text available* f?3 odff 1 04 MB i Additional Information: fulj. citation, abstract .references, citings, index 
' * terms 

In recent years; it has become increasingly clear that there is need for a content- 
addressable computer memory. Larger and larger programs are being written which require 
a structured data base to operate with any efficiency. Many of these could well benefit by 
replacing tedious searches with a fast, efficient, "content-addressable" access of the data 
store. A good example is the "key-word" library search. If one asks for a list of the books 
written by J. von Neumann ... 
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Jun Yang, Rajiv Gupta, Chuanjun Zhang 

July 2004 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 9 Issue 3 

Full text available: ^fidfUZS MB! Additional Information: full citation, abstract, references, Mex terms 

Since the I/O pins of a CPU are a significant source of energy consumption, work has been 
done on developing encoding schemes for reducing switching activity on external buses. 
Modest reductions in switching can be achieved for data and address buses using a number 
of general purpose encoding schemes. However, by exploiting the characteristic of memory 
reference locality, switching activity on the address bus can be reduced by as much as 
66&percnt;. Till now no characteristic has been identified ... 

Keywords: I/O pin capacitance, Low power data buses, encoding, internal capacitance, 
switching 
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Chitra Natarajan, Bruce Christenson, Faye Briggs 

June 2004 Proceedings of the 3rd workshop on Memory performance issues: in 
conjunction with the 31st international symposium on computer 
architecture WMPI '04 

Full text available: "^ L pdR316,6S KB) Additional Information: fyjj..citatjpn j abstract, reference^ index terms 

With the growing imbalance between processor and memory performance it becomes more 
and more important to optimize the memory controller features to obtain the maximum 
possible performance out of the memory subsystem. This paper presents a study of the 
performance impact of several memory controller features in multi-processor (MP) server 
environments that use a DDR/DDR2 based memory subsystem. The results from our 
studies show that significant performance improvements can be obtained by careful ... 

Keywords: memory controller, memory subsystem, memory transaction scheduling, multi- 
processors, performance impact, server systems 



5 Queue Management in Network Processors Q 
I. Papaefstathiou, T. Orphanoudakis, G. Kornaros, C. Kachris, I. Mavroidis, A. Nikologiannis 
March 2005 Proceedings of the conference on Design, Automation and Test in Europe - 
Volume 3 

Full text available: ^pdfQ40..7g.KBj Additional Information: fuJicitatjon, abstract 

One of the main bottlenecks when designing a network processing system is very often its 
memory subsystem. This is mainly due to the state-of-the-art network links operating at 
very high speeds and to the fact that in order to support advanced Quality of Service (QoS), 
a large number of independent queues is desirable. In this paper we analyze the 
performance bottlenecks of various data memory managers integrated in typical Network 
Processing Units (NPUs). We expose the performance limitations o ... 

Keywords: Network processor, memory management, queue management 
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Kevin Kreeger, Arie Kaufman 

July 1999 Proceedings of the ACM SIGGRAPH/ EUROGRAPHICS workshop on Graphics 
hardware 

Full text available: 'g )pdf(1.85 MB) Additional Information: full citation, references, citings, index terms 
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Keywords: cube architecture, mixing polygons and volumes, ray casting, run-length- 
encoding, volume rendering 



7 Unlocking the Performance of the BlueGene/L Supercomputer H 
George Almasi, Siddhartha Chatterjee, Alan Gara, John Gunnels, Manish Gupta, Amy Henning, 
Jose E. Moreira, Bob Walkup 

November 2004 Proceedings of the 2004 ACM/IEEE conference on Supercomputing 

Full text available: @pdfC17£M.KBj Additional Information: Ml cMiQfl, abstract 

The BlueGene/L supercomputer is expected to deliver new levels of application performance 
by providing a combination of good single-node computational performance and high 
scalability. To achieve good single-node performance, the BlueGene/L design includes a 
special dual floating-point unit on each processor and the ability to use two processors per 
node. BlueGene/L also includes both a torus and a tree network to achieve high scalability. 
We demonstrate how benchmarks and applications can take ... 



8 Memory Controller Optimizations for Web Servers H 
Scott Rixner 

December 2004 Proceedings of the 37th annual International Symposium on 
Microarchitecture 

Full text available: ^ pdf(281.5S KB) Additional Information: fall citation, abstract 

This paper analyzes memory access scheduling and virtual channels as mechanisms to 
reduce the latency of main memory accesses by the CPU and peripherals in web servers. 
Despite the address filtering effects of the CPU's cache hierarchy, there is significant locality 
and bank parallelism in the DRAM access stream of a web server, which includes traffic 
from the operating system, application, and peripherals. However, a sequential memory 
controller leaves much of this locality and parallelism unex ... 



9 increasing web server throughput with network interface data caching 
Hyong-youb Kim, Vijay S. Pai, Scott Rixner 

October 2002 Proceedings of the 10th international conference on Architectural 

support for programming languages and operating systems, volume 30 , 37 , 

36 Issue 5 , 10 , 5 

Full text available: ^&df(1 .22 }AB\ Additional Information: MLPMiQQ, abstract references, citings 

This paper introduces network interface data caching, a new technique to reduce local 
interconnect traffic on networking servers by caching frequently- requested content on a 
programmable network interface. The operating system on the host CPU determines which 
data to store in the cache and for which packets it should use data from the cache. To 
facilitate data reuse across multiple packets and connections, the cache only stores 
application-level response content (such as HTTP data), with applica ... 

10 Se ssio n P9: interact ive volume rendering: Texture h ardware assi st e d re n dering of 

time-varying volume data 

Eric B. Lum, Kwan Liu Ma, John Clyne 

October 2001 Proceedings of the conference on Visualization '01 

Full text available: ^g j D( jf(ll.7? MQjl fP Additional Information: .fuNcjtation, abstract references, citings, index 
PubjjsherSite 

In this paper we present a hardware-assisted rendering technique coupled with a 
compression scheme for the interactive visual exploration of time-varying scalar volume 
data. A palette-based decoding technique and an adaptive bit allocation scheme are 
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developed to fully utilize the texturing capability of a commodity 3-D graphics card. Using a 
single PC equipped with a modest amount of memory, a texture capable graphics card, and 
an inexpensive disk array, we are able to render hundreds of time s ... 

Keywords: PC, compression, high performance computing, out-of-core processing, 
scientific visualization, texture hardware, time-varying data, transform encoding, volume 
rendering 



11 Single-Chip MPEG-2 422P@HL CODEC L SI with Multi- Chip Configuration for Large Q 
Scale Processing beyond HDTV Level 

Hiroe Iwasaki, Jiro Naganuma, Koyo Nitta, Ken Nakamura, Takeshi Yoshitome, Mitsuo Ogura, 
Yasuyuki Nakajima, Yutaka Tashiro, Takayuki Onishi, Mitsuo Ikeda, Makoto Endo 
March 2003 Proceedings of the conference on Design, Automation and Test in Europe: 
Designers' Forum - Volume 2 DATE '03 

Full text available: "fl pdfj[362.07 KB) 

J! Additional Information: full citation , abstract , index terms 

W Publisher Site 

This paper proposes a new architecture for VASA, a single-chip MPEG-2 422P@HL CODEC 
LSI with multi-chip configuration for large scale processing beyond the HDTV level, and 
demonstrates its flexibility and usefulness. This architecture consists of triple encoding 
cores, a decoding core, a multiplexer/de-multiplexer core, and several dedicated 
application-specific hardware modules with a hierarchical flexible communication scheme for 
high-performance data transfer. VASA is the worldys first single ... 

12 Embedded systems: applications, solutions and techniques (EMBS): Assessing the B 
effect of failure severity, coincident failures and usage-profiles on the reliability of 

embedded control systems 
Frederick T. Sheldon, Kshamta Jerath 

March 2004 Proceedings of the 2004 ACM symposium on Applied computing 

Full text available: ^.pdf{327 t 91. KB.) Additional Information: Mlcjtation, abstract, references 

The increasingly ubiquitous use of embedded systems to manage and control our 
technologically (ever-increasing) complex lives makes us more vulnerable than ever before. 
Knowing how reliable such systems are is absolutely necessary especially for safety, 
mission and infrastructure critical applications. This paper presents a structured 
compositional modeling method for assessing reliability based on characteristic data and 
stochastic models. We illustrate this using a classic embedded control sys ... 

Keywords: design, measurement, performance, reliability 



13 Process^ 

multiprocessor SoC design 

Ferid Gharsalli, Damien Lyonnard, Samy Meftali, Frederic Rousseau, Ahmed A. Jerraya 
October 2002 Proceedings of the 15th international symposium on System Synthesis 

Full text available: Mp#'6S2.89 Km Additional Information: fall citation, abstract, references , citings, index 
' ™" v - terms 

In this paper, we present a new methodology for application specific multiprocessor 
system-on-chip design. This approach facilitates the integration of existing components with 
the concept of wrapper. Wrappers allow automatic adaptation of physical interfaces to a 
communication network. We also give a generic architecture to produce these wrappers, 
either for processors or for other specific components such as memory IP. This approach 
has successfully been applied on a low-level image processing ... 
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Keywords: embedded memory, memory access, memory wrapper generation, system-on- 
chip 



14 Polygon rendering on a stream architecture 

John D. Owens, William 3. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery 
August 2000 Proceedings of the ACM SIGGRAPH/ EUROGRAPHICS workshop on 
Graphics hardware 

Additional Information: full citation, abstract, references, citings , index 



Full text available: "pa pdf(161.55 KB) 

^ " terms 

The use of a programmable stream architecture in polygon rendering provides a powerful 
mechanism to address the high performance needs of today's complex scenes as well as 
the need for flexibility and programmability in the polygon rendering pipeline. We describe 
how a polygon rendering pipeline maps into data streams and kernels that operate on 
streams, and how this mapping is used to implement the polgyon rendering pipeline on 
Imagine, a programmable stream processor. We compare our resul ... 

Keywords: OpenGL, SIMD, graphics hardware, kernels, media processors, polygon 
rendering, stream architecture, stream processing, streams 



15 The design and implementation of a new out-of-core sparse choiesky factorization 
method 

Vladimir Rotkin, Sivan Toledo 

March 2004 ACM Transactions on Mathematical Software (TOMS), Volume 30 issue l 

Full text available' ^ "«dft457 74 KB) Additional Information: full citation , abstract, references, index terms. 

review 

We describe a new out-of-core sparse Choiesky factorization method. The new method uses 
the elimination tree to partition the matrix, an advanced subtree-scheduling algorithm, and 
both right-looking and left-looking updates. The implementation of the new method is 
efficient and robust. On a 2 GHz personal computer with 768 MB of main memory, the code 
can easily factor matrices with factors of up to 48 GB, usually at rates above 1 Gflop/s. For 
example, the code can factor audikw, currenly the lar ... 

Keywords: out-of-core 
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Manfred Weiler, Thomas Ertl 

October 2001 Proceedings of the conference on Visualization '01 

Additional Information: full citation, abstract, references , citings, index 
terms 

In this paper we address the problem of interactively resampling unstructured grids. Three 
algorithms are presented. They all allow adaptive resampling of an unstructured grid on a 
multiresolution hierarchy of arbitrarily sized cartesian grids according to a varying element 
size. Two of the algorithms presented take advantage of hardware accelerated polygon 
rendering and 2D texture mapping. In exploiting new features of modern PC graphics 
adapters, the first algorithm tries to significantly mini ... 
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May 2005 Proceedings of the 2nd conference on Computing frontiers 



Full text available: 
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Full text available: ^pdff374.11 KB) Additional Information: full citation, abstract , references, index terms 

This paper introduces Eldorado, a third generation multithreaded architecture. Previous 
Cray multithreaded systems were plagued by unreliable hardware and high costs. Eldorado 
corrects these problems by using many parts built for other commercial systems. Its 
compute processor is a 500 MHZ multithreaded processor architecturally similar to the MTA- 
2 processor; but its interconnection network, I/O subsystem, and service processors are 
borrowed from other Cray systems. Eldorado retains the program ... 

Keywords: heterogeneous architectures, multithreaded architectures, multithreaded 
processing, performance studies 



18 Research Q 
Naga K. Govindaraju, Nikunj Raghuvanshi, Dinesh Manocha 

June 2005 Proceedings of the 2005 ACM SIGMOD international conference on 
Management of data 

Full text available: " ^j sdf(658.99 KB) Additional Information: full citation, abstract , references 

We present algorithms for fast quantile and frequency estimation in large data streams 
using graphics processors (GPUs). We exploit the high computation power and memory 
bandwidth of graphics processors and present a new sorting algorithm that performs 
rasterization operations on the GPUs. We use sorting as the main computational component 
for histogram approximation and construction of e-approximate quantile and frequency 
summaries. Our algorithms for numerical statistics computation on ... 

Keywords: data streams, frequencies, graphics processors, memory bandwidth, quantiles, 
sliding windows, sorting 



1 9 Scalable, h jgh-speMpref jx JlJMching 

Marcel Waldvogel, George Varghese, Jon Turner, Bernhard Plattner 

November 2001 ACM Transactions on Computer Systems (TOCS), volume 19 issue 4 

Full text available: pdf(933 02 KB) Additional Information: full citation , abstract , references , citings , index 

^ 1 terms 

Finding the longest matching prefix from a database of keywords is an old problem with a 
number of applications, ranging from dictionary searches to advanced memory 
management to computational geometry. But perhaps today's most frequent best matching 
prefix lookups occur in the Internet, when forwarding packets from router to router. 
Internet traffic volume and link speeds are rapidly increasing; at the same time, a growing 
user population is increasing the size of routing tables against which p ... 

Keywords: collision resolution, forwarding lookups, high-speed networking 
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Ronald J. Brachman, Brian C. Smith 
February 1980 ACM SIGART Bulletin, issue 70 

Full text available: "g )pdf(13.13 MB) Additional Information: full citation, abstract 

In the fall of 1978 we decided to produce a special issue of the SIGART Newsletter devoted 
to a survey of current knowledge representation research. We felt that there were twe 
useful functions such an issue could serve. First, we hoped to elicit a clear picture of how 
people working in this subdiscipline understand knowledge representation research, to 
illuminate the issues on which current research is focused, and to catalogue what 
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